Skip to content

Document upcoming MXNet training script format #390

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 20 commits into from
Sep 18, 2018

Conversation

laurenyu
Copy link
Contributor

Description of changes:
This change adds a warning to the README about the upcoming MXNet training script format, in addition to basic instructions of what kind of changes will be needed once the format is changed.

I didn't use a ReST admonition for the warning because GitHub won't render them.

Merge Checklist

Put an x in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your pull request.

  • I have read the CONTRIBUTING doc
  • I have added tests that prove my fix is effective or that my feature works (if appropriate)
  • I have updated the changelog with a description of my changes (if appropriate)
  • I have updated any necessary documentation (if appropriate)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

3. Save the model

Hyperparameters will be passed as command-line arguments to your training script.
In addition, the locations for finding input data and saving the model and output data will need to be defined.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We pass these in the container with environment variables. I don't think users can override these in their training script. They can specify the s3 locations for these. But since we handle the data downloading and mounting the shared partition for the container they have to use the paths in the environment variables. I think we should explain this a little better and put a reference to the environment variables in sagemaker-containers. There is a list in the README.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we store the locations in environment variables, but we do rely on the user to read those environment variables in their script, so they do have to define variables with those locations. I can add explanation about the environment variables though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest:

In addition, you need to define the locations of where to get the input data and where to save the model artifacts and output data.

parser.add_argument('--learning-rate', type=float, default=0.1)

# input data and model directories
parser.add_argument('--model-dir', type=str, default='opt/ml/model')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should use the environment variables to set the args here as well.

icywang86rui
icywang86rui previously approved these changes Sep 17, 2018
@@ -599,13 +599,16 @@ The code executed from your main guard needs to:
3. Save the model

Hyperparameters will be passed as command-line arguments to your training script.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest replacing "will be" with "are".

@@ -599,13 +599,16 @@ The code executed from your main guard needs to:
3. Save the model

Hyperparameters will be passed as command-line arguments to your training script.
In addition, the locations for finding input data and saving the model and output data will need to be defined.
In addition, the locations for finding input data and saving the model and output data will be provided as environment variables rather than as arguments to a function.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion:

In addition, you specify the locations of input data and where to save the model artifacts and output data as environment variables in the container, rather than as arguments to a function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the container provides this info to the user, not the other way around. Would this be clearer?

In addition, the container will define the locations of input data and where to save the model artifacts and output data as environment variables rather than passing that information through train.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor change:

...rather than passing that information as arguments to the train function.

2. Initiate training
3. Save the model

Hyperparameters will be passed as command-line arguments to your training script.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest replacing "will be" with "are"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used future tense because these instructions are going to be live awhile before the changes themselves are released - I'm afraid present tense might be too confusing

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough

args, _ = parser.parse_known_args()

The code in the main guard should also take care of training and saving the model.
This can be as simple as just calling the methods used with the previous training script format:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest:

This can be as simple as calling the train and save methods used in the previous training script format.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed

eslesar-aws
eslesar-aws previously approved these changes Sep 17, 2018
@laurenyu laurenyu merged commit cb184cb into aws:master Sep 18, 2018
@laurenyu laurenyu deleted the mxnet-script-mode-readme-warning branch September 18, 2018 16:36
pdasamzn pushed a commit to pdasamzn/sagemaker-python-sdk that referenced this pull request Nov 1, 2018
The next major release of MXNet will change the
training script format. This README change documents the
changes needed by the user to adjust to the new format.
This is currently just a warning as the new format is not out yet.
The warning is meant to help users plan for the upcoming change.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants